Enable -fno-reciprocal-math to fix div accuracy. #2413

SanityRemnants · 2025-11-26T08:28:45Z

Added -fno-reciprocal-math to SYCL_KERNEL_OPTIONS to eliminate issues with division of the number by itself returning non 1 output which caused: #1895.

Copilot

Pull request overview

This PR addresses a division accuracy issue in SYCL kernels where dividing a number by itself could return non-1 results. The fix adds the -fno-reciprocal-math compiler flag to disable reciprocal math optimizations that can introduce numerical inaccuracies.

Key Changes:

Added -fno-reciprocal-math to SYCL_KERNEL_OPTIONS to prevent compiler optimizations that replace division operations with multiplication by reciprocals

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

riverliuintel · 2025-11-28T08:00:12Z

@CuiYifeng

SanityRemnants · 2025-11-28T10:44:06Z

Currently evaluating the use of a more localized approach using #pragma clang fp reciprocal(off) to minimize the impact on performance. Do not merge for now.

EikanWang · 2025-11-30T03:08:21Z

cmake/BuildFlags.cmake

    # gcc -shared host.o kernel.o device-code.o -o libxxx.so
    set(SYCL_KERNEL_OPTIONS ${SYCL_KERNEL_OPTIONS} -fno-sycl-unnamed-lambda)
    set(SYCL_KERNEL_OPTIONS ${SYCL_KERNEL_OPTIONS} -sycl-std=2020)
+    set(SYCL_KERNEL_OPTIONS ${SYCL_KERNEL_OPTIONS} -fno-reciprocal-math)


Let's study the NVCC behaviors first. I think reciprocal-math is a general optimization, and most the compilers enable it by default.

@EikanWang I believe --prec-div https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html?highlight=reciprocal#prec-div-true-false-prec-div is the corresponding NVCC flag by default set to True unless --use_fast_math (https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html?highlight=reciprocal#use-fast-math-use-fast-math) is set to True. I do not believe pytorch is built with --use_fast_math correct me if I'm wrong.

Thanks for the information. Some HW platforms like CPU turn on this option by default. For CUDA, it keeps its default, saying use_fast_math is False. So, it makes sense to add no-reciprocal-math. I'm curious why SYCL turns on this option by default.

@fengyuan14 do you know the background?

torch-xpu-ops/cmake/BuildFlags.cmake

Line 77 in 20baf08

# The fast-math will be enabled by default in SYCL compiler.

Per my understanding, we should disable fast-math to align with CUDA

SanityRemnants · 2025-12-01T11:41:39Z

There are a few options regarding #pragma clang fp reciprocal(off) (https://clang.llvm.org/docs/LanguageExtensions.html#extensions-to-specify-floating-point-flags) with the least invasive being adding it only to the BinaryDivTrueKernel.cpp file. However, I'm not sure whether it's worth it to sacrifice some performance for accuracy in this oddly specific use case (calling div first and than trunc instead of div_trunc) for which we have no known business need outside the failing UT. Both the div and div_trunc results are within needed norms. It might be better to modify the UT. What is your opinion on that @EikanWang?

EikanWang · 2025-12-03T07:44:52Z

There are a few options regarding #pragma clang fp reciprocal(off) (https://clang.llvm.org/docs/LanguageExtensions.html#extensions-to-specify-floating-point-flags) with the least invasive being adding it only to the BinaryDivTrueKernel.cpp file. However, I'm not sure whether it's worth it to sacrifice some performance for accuracy in this oddly specific use case (calling div first and than trunc instead of div_trunc) for which we have no known business need outside the failing UT. Both the div and div_trunc results are within needed norms. It might be better to modify the UT. What is your opinion on that @EikanWang?

@SanityRemnants , Although the performance is the king, I'd prefer to keep a high bar of maintenance when we have no concrete performance impact.

@SanityRemnants since it is close to the PyTorch 2.10 branch cut date, I'd like to defer this PR landing.
@weishi-deng , @chuanqi129 , could you help collect the performance data w/ this PR? My expectation is that this PR only impacts a few micro-benchmarks and there should be no impact on E2E models. Priority-wise, the collections is lower than PT 2.10.

NeoZhangJianyu · 2025-12-04T07:28:43Z

There are a few options regarding #pragma clang fp reciprocal(off) (https://clang.llvm.org/docs/LanguageExtensions.html#extensions-to-specify-floating-point-flags) with the least invasive being adding it only to the BinaryDivTrueKernel.cpp file. However, I'm not sure whether it's worth it to sacrifice some performance for accuracy in this oddly specific use case (calling div first and than trunc instead of div_trunc) for which we have no known business need outside the failing UT. Both the div and div_trunc results are within needed norms. It might be better to modify the UT. What is your opinion on that @EikanWang?

@SanityRemnants , Although the performance is the king, I'd prefer to keep a high bar of maintenance when we have no concrete performance impact.

@SanityRemnants since it is close to the PyTorch 2.10 branch cut date, I'd like to defer this PR landing.

@weishi-deng , @chuanqi129 , could you help collect the performance data w/ this PR? My expectation is that this PR only impacts a few micro-benchmarks and there should be no impact on E2E models. Priority-wise, the collections is lower than PT 2.10.

There is no such issue on CUDA.
IPEX has added some similar parameters to fix the accuracy issue.
Looks like they are not upstream to Pytorch.

This parameter is neither in IPEX or in PyTorch.
It only impact the OPs including divide().

We are waiting for this PR to fix related issue.
Please speed up the review!

Thank you!

add -fno-reciprocal-math to SYCL_KERNEL_OPTIONS

d8fc982

Copilot AI review requested due to automatic review settings November 26, 2025 08:28

Copilot AI reviewed Nov 26, 2025

View reviewed changes

Merge branch 'main' into kdrozd/div_accuracy

769d591

riverliuintel requested a review from CuiYifeng November 28, 2025 08:00

CuiYifeng requested review from EikanWang and guangyey November 28, 2025 08:35

Merge branch 'main' into kdrozd/div_accuracy

94f8900

EikanWang reviewed Nov 30, 2025

View reviewed changes

Merge branch 'main' into kdrozd/div_accuracy

0d796ab

Merge branch 'main' into kdrozd/div_accuracy

876abd9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable -fno-reciprocal-math to fix div accuracy. #2413

Enable -fno-reciprocal-math to fix div accuracy. #2413

SanityRemnants commented Nov 26, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

riverliuintel commented Nov 28, 2025

Uh oh!

SanityRemnants commented Nov 28, 2025

Uh oh!

EikanWang Nov 30, 2025

Uh oh!

SanityRemnants Dec 1, 2025 •

edited

Loading

Uh oh!

EikanWang Dec 3, 2025

Uh oh!

EikanWang Dec 3, 2025

Uh oh!

SanityRemnants commented Dec 1, 2025 •

edited

Loading

Uh oh!

EikanWang commented Dec 3, 2025

Uh oh!

NeoZhangJianyu commented Dec 4, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Enable -fno-reciprocal-math to fix div accuracy. #2413

Are you sure you want to change the base?

Enable -fno-reciprocal-math to fix div accuracy. #2413

Conversation

SanityRemnants commented Nov 26, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

riverliuintel commented Nov 28, 2025

Uh oh!

SanityRemnants commented Nov 28, 2025

Uh oh!

EikanWang Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

SanityRemnants Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EikanWang Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

EikanWang Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

SanityRemnants commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EikanWang commented Dec 3, 2025

Uh oh!

NeoZhangJianyu commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

SanityRemnants Dec 1, 2025 •

edited

Loading

SanityRemnants commented Dec 1, 2025 •

edited

Loading

NeoZhangJianyu commented Dec 4, 2025 •

edited

Loading